Geospatial analysis of clinical trial sites in the USA:

Exploring Demographics and Vulnerabilities

Author - Priyadarshini Satish

Date – 03/10/2023

Course - GGIS 407 - Cyber GIS and Geospatial Data Science

Professor - Dr. Anand Padmanabhan

Introduction

In the medical and pharmaceutical industry, selection of sites for clinical trials is a critical task that involves analyzing proprietary information about demographics, geography, economy, and market dynamics. A comprehensive list of clinical trial sites from published by the US National Library of Medicine is available online. This data set includes information about clinical trials registered on the website since 2000 across the world. While it is not exhaustive, it contains a list of all studies that require registration with the FDAA that involve studies on behaviors, effect of a drug, or procedure on human volunteers.

In the present study, this data set is filtered for sites in the United States. The city each site is in is plotted on the map. This geospatial data is visualized as a heat map and compared with county level socio-economic indicators from the US census bureau. The data retrieved from the census includes, median age, health insurance coverage, median household income, poverty rate. The expected result is insights on where pharmaceutical companies tend to locate sites for such studies. Further, we can explore if certain vulnerabilities such as lower income are exploited when locating such sites to attract more participants.

Data Visualization

In [ ]:
#import necessary libraries
import pandas as pd
import geopandas as gpd
import math
import folium
import zipfile
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, MarkerCluster
In [ ]:
with zipfile.ZipFile('Geodata.zip','r') as zip_ref:
    zip_ref.extractall()
In [ ]:
censusdata = pd.read_excel("censusdata.xlsx")
#excel data from previous section was converted to a geojson using arcgis Pro. That file is loaded into the notebook here. 
geodata = gpd.read_file('CT_geo.geojson')
In [ ]:
#Creating groups of unique cities to get a count of facilities by city 
bubbledata = geodata.groupby(['city','state','country']).agg(
    count_of_facilities = ('facility_name','count'),
    lat = ('lat','first'),
    lon = ('lng','first'),
    geometry = ('geometry','first')).reset_index()

Creating a table grouping by cities to get a count of how many facilities are in each city

In [ ]:
# Create a map indicating the location of all the cities and the count of facilities in each. 
m_4 = folium.Map(location=[48,-102], tiles='cartodbpositron', zoom_start=3)

# Add points to the map
mc = MarkerCluster()
for idx, row in bubbledata.iterrows():
    if not math.isnan(row['lon']) and not math.isnan(row['lat']):
        mc.add_child(Marker([row['lat'], row['lon']]))
m_4.add_child(mc)
# add marker one by one on the map
for i in range(0,len(bubbledata)):
   folium.Marker(
      location=[bubbledata.iloc[i]['lat'], bubbledata.iloc[i]['lon']],
      popup=bubbledata.iloc[i][['city','count_of_facilities']],
      # icon=folium.DivIcon(html=f"""<div style="font-family: courier new; color: blue">{bubbledata.iloc[i]['city']}</div>""")
   ).add_to(mc)
# Display the map
m_4

The map above gives an overview of where the sites are located. Zoom through and click on markers to see city names, as well as the number of facilites in each of them

In [ ]:
m_4.save("city_count.html") #save to a file
In [ ]:
# Create a base map
m_5 = folium.Map(location=[48,-102], tiles='cartodbpositron', zoom_start=4)

#Add a heatmap to the base map
HeatMap(data=geodata[['lat', 'lng']], radius=10).add_to(m_5)

# Display the map
m_5

The map above is a heat map showing greatest concentration of site clusters. Green colour shows greater density of sites. The data used contains over 150,000 clinical trial data and compounds the information from multiple sites in a single city to show greater density of sites in that region.

In [ ]:
#Import Libraries
import geopandas as gpd
import pandas as pd
import numpy as np
import folium
from folium.features import GeoJsonTooltip
from pygris import counties 
from pygris.utils import shift_geometry
from matplotlib import pyplot as plt
In [ ]:
us_counties = counties(cb = True, resolution = "20m", cache = True, year = 2019)
us_counties['GEOID'] = us_counties['GEOID'].astype(int)
merged_data = us_counties.merge(censusdata, on = "GEOID")
In [ ]:
#Create two FeatureGroup layers
us_map = folium.Map(location=[40, -102], zoom_start=4,tiles='cartodbpositron',overlay=False)
fg1 = folium.FeatureGroup(name='Percent below Poverty',overlay=False).add_to(us_map)
fg2 = folium.FeatureGroup(name='Percent covered by Health Insurance',overlay=False).add_to(us_map)
fg3 = folium.FeatureGroup(name='Median Household Income',overlay=False).add_to(us_map)
fg4 = folium.FeatureGroup(name='Median Age',overlay=False).add_to(us_map)
In [ ]:
#Add the first choropleth map layer to fg1
custom_scale1 = (merged_data['poverty_perc'].quantile((0,0.2,0.4,0.6,0.8,1))).tolist()
Poverty=folium.Choropleth(
            geo_data=us_counties,
            data=merged_data,
            columns=['GEOID', 'poverty_perc'],  
            key_on='feature.properties.GEOID', 
            threshold_scale=custom_scale1, #use the custom scale we created for legend
            fill_color='YlOrRd',
            nan_fill_color="White", #Use white color if there is no data available for the county
            fill_opacity=0.7,
            line_opacity=0.2,
            legend_name='Percent of Population Below Poverty Level',
            highlight=True,
            overlay=False,
            line_color='black').geojson.add_to(fg1)

#Add customized tooltips to the map
folium.features.GeoJson(
                    data=merged_data,
                    name='Percent of Population Below Poverty Level',
                    smooth_factor=2,
                    style_function=lambda x: {'color':'black','fillColor':'transparent','weight':0.5},
                    tooltip=folium.features.GeoJsonTooltip(
                        fields=['NAME',
                                'poverty_perc',],
                        aliases=["County Name:",
                                 "Percent below poverty"],
                        localize=True,
                        sticky=False,
                        labels=True,
                        style="""
                            background-color: #F0EFEF;
                            border: 2px solid black;
                            border-radius: 3px;
                            box-shadow: 3px;
                        """,
                        max_width=800,),
                            highlight_function=lambda x: {'weight':3,'fillColor':'grey'},
                        ).add_to(Poverty) 
In [ ]:
#Add the second choropleth map layer to fg2
custom_scale2 = (merged_data['covered_perc'].quantile((0,0.2,0.4,0.6,0.8,1))).tolist()
Insurance=folium.Choropleth(
            geo_data=us_counties,
            data=merged_data,
            columns=['GEOID', 'covered_perc'],  #Here we tell folium to get the county fips and plot the 'pct_positive_7days' metric for each county
            key_on='feature.properties.GEOID', #Here we grab the geometries/county boundaries from the geojson file using the key 'coty_code' which is the same as fips_code
            threshold_scale=custom_scale2, #use the custom scale we created for legend
            fill_color='YlOrRd',
            nan_fill_color="White", #Use white color if there is no data available for the county
            fill_opacity=0.7,
            line_opacity=0.2,
            legend_name='Percent of Population covered by Health Insurance',
            highlight=True,
            overlay=False,
            line_color='black').geojson.add_to(fg2)

#Add customized tooltips to the map
folium.features.GeoJson(
                    data=merged_data,
                    name='Percent of Population covered by Health Insurance',
                    smooth_factor=2,
                    style_function=lambda x: {'color':'black','fillColor':'transparent','weight':0.5},
                    tooltip=folium.features.GeoJsonTooltip(
                        fields=['NAME',
                                'covered_perc'],
                        aliases=["County Name:",
                                 "Percent Covered by Insurance:"],
                        localize=True,
                        sticky=False,
                        labels=True,
                        style="""
                            background-color: #F0EFEF;
                            border: 2px solid black;
                            border-radius: 3px;
                            box-shadow: 3px;
                        """,
                        max_width=800,),
                            highlight_function=lambda x: {'weight':3,'fillColor':'grey'},
                        ).add_to(Insurance) 
In [ ]:
#Add the third choropleth map layer to fg3
custom_scale3 = (merged_data['median_age'].quantile((0,0.2,0.4,0.6,0.8,1))).tolist()
Age=folium.Choropleth(
            geo_data=us_counties,
            data=merged_data,
            columns=['GEOID', 'median_age'],  #Here we tell folium to get the county fips and plot the 'pct_positive_7days' metric for each county
            key_on='feature.properties.GEOID', #Here we grab the geometries/county boundaries from the geojson file using the key 'coty_code' which is the same as fips_code
            threshold_scale=custom_scale3, #use the custom scale we created for legend
            fill_color='YlOrRd',
            nan_fill_color="White", #Use white color if there is no data available for the county
            fill_opacity=0.7,
            line_opacity=0.2,
            legend_name='Median Age of Population',
            highlight=True,
            overlay=False,
            line_color='black').geojson.add_to(fg3)

#Add customized tooltips to the map
folium.features.GeoJson(
                    data=merged_data,
                    name='Median Age of Population',
                    smooth_factor=2,
                    style_function=lambda x: {'color':'black','fillColor':'transparent','weight':0.5},
                    tooltip=folium.features.GeoJsonTooltip(
                        fields=['NAME',
                                'median_age'],
                        aliases=["County Name:",
                                 "Median Age:"],
                        localize=True,
                        sticky=False,
                        labels=True,
                        style="""
                            background-color: #F0EFEF;
                            border: 2px solid black;
                            border-radius: 3px;
                            box-shadow: 3px;
                        """,
                        max_width=800,),
                            highlight_function=lambda x: {'weight':3,'fillColor':'grey'},
                        ).add_to(Age) 
In [ ]:
#Add the fourth choropleth map layer to fg4
custom_scale4 = (merged_data['median_HHI'].quantile((0,0.2,0.4,0.6,0.8,1))).tolist()
Income=folium.Choropleth(
            geo_data=us_counties,
            data=merged_data,
            columns=['GEOID', 'median_HHI'],  #Here we tell folium to get the county fips and plot the 'pct_positive_7days' metric for each county
            key_on='feature.properties.GEOID', #Here we grab the geometries/county boundaries from the geojson file using the key 'coty_code' which is the same as fips_code
            threshold_scale=custom_scale4, #use the custom scale we created for legend
            fill_color='YlOrRd',
            nan_fill_color="White", #Use white color if there is no data available for the county
            fill_opacity=0.7,
            line_opacity=0.2,
            legend_name='Median HHI of Population',
            highlight=True,
            overlay=False,
            line_color='black').geojson.add_to(fg4)

#Add customized tooltips to the map
folium.features.GeoJson(
                    data=merged_data,
                    name='Median HHI of Population',
                    smooth_factor=2,
                    style_function=lambda x: {'color':'black','fillColor':'transparent','weight':0.5},
                    tooltip=folium.features.GeoJsonTooltip(
                        fields=['NAME',
                                'median_HHI'],
                        aliases=["County Name:",
                                 "Median HHI:"],
                        localize=True,
                        sticky=False,
                        labels=True,
                        style="""
                            background-color: #F0EFEF;
                            border: 2px solid black;
                            border-radius: 3px;
                            box-shadow: 3px;
                        """,
                        max_width=800,),
                            highlight_function=lambda x: {'weight':3,'fillColor':'grey'},
                        ).add_to(Income) 
In [ ]:
#Add layer control to the map
folium.TileLayer('cartodbdark_matter',overlay=True,name="View in Dark Mode").add_to(us_map)
folium.TileLayer('cartodbpositron',overlay=True,name="Viw in Light Mode").add_to(us_map)
folium.LayerControl(collapsed=False).add_to(us_map)
us_map
us_map.save("index.html") #save to a file
In [ ]:
us_map

The choropleth map above shows the socio-economic indicators of every county in the USA. The darker regions are indicative of greater concentrations of poverty, higher household income, higher median age or high percentage of health insurance coverage according to the map selected. You can interact with the map and see the indicators for each county as you click through